-
Notifications
You must be signed in to change notification settings - Fork 6
Add initial LoRA finetuning support; vulkan OUT_PROD; vulkan cross-entropy-backward #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: temp-finetuning
Are you sure you want to change the base?
Conversation
Signed-off-by: vineet <[email protected]>
Signed-off-by: vineet <[email protected]>
Signed-off-by: vineet <[email protected]>
Signed-off-by: vineet <[email protected]>
Signed-off-by: vineet <[email protected]>
Signed-off-by: vineet <[email protected]>
Signed-off-by: vineet <[email protected]>
Signed-off-by: vineet <[email protected]>
…lation Signed-off-by: vineet <[email protected]>
Steps to test llama.cpp inference on Android:
make sure to checkout the
|
For testing I'll reference the updated README: https://github.com/tetherto/qvac-ext-lib-llama.cpp/blob/bc7dd9f9288222394da37eac3d7adf71d409ad83/examples/training/README.md#using-trained-adapters |
command we used for testing |
Signed-off-by: vineet <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM in general, just some small comments/nits overall, feel free to ignore the nitpicks :).
device->device.destroyBuffer(buffer); | ||
device->device.freeMemory(device_memory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good but the commit msg should be updated to remove wip
- would also be good to explain what specific crash this fixes in the commit msg.
@@ -93,4 +93,4 @@ int main(int argc, char ** argv) { | |||
llama_backend_free(); | |||
|
|||
return 0; | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: nothing changed, I'd drop it from the commit.
@@ -3202,7 +3202,8 @@ static bool ggml_backend_cuda_device_supports_op(ggml_backend_dev_t dev, const g | |||
} | |||
} break; | |||
case GGML_OP_OUT_PROD: | |||
return op->type == GGML_TYPE_F32 && op->src[0]->type == GGML_TYPE_F32 && op->src[1]->type == GGML_TYPE_F32; | |||
// return op->type == GGML_TYPE_F32 && op->src[0]->type == GGML_TYPE_F32 && op->src[1]->type == GGML_TYPE_F32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to keep this prev code? We can always check git history if we need to revert.
const float * src0_d = (const float *) src0->data; | ||
const float * src1_d = (const float *) src1->data; | ||
// const float * src0_d = (const float *) src0->data; | ||
// const float * src1_d = (const float *) src1->data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here and in other places, I would drop the old code in general.
|
||
if (allocated_src0) { | ||
CUDA_CHECK(cudaFreeAsync(src0_f32, stream)); | ||
// printf("DEBUG: Freed dequantized src0 buffer\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: while here I would also remove these leftover debugs - here and in other similar places.
case GGML_OP_ADD: | ||
case GGML_OP_SUB: | ||
case GGML_OP_MUL: | ||
case GGML_OP_DIV: | ||
return (op->src[0]->type == GGML_TYPE_F32 || op->src[0]->type == GGML_TYPE_F16) && | ||
return (op->src[0]->type == GGML_TYPE_F32 || op->src[0]->type == GGML_TYPE_F16) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spurious change?
Looks like there are some CI failures also related to these changes - see https://github.com/tetherto/qvac-ext-lib-llama.cpp/actions/runs/17076253696/job/48418341198?pr=5 for example:
|
The PR adds: